factor model
Learning Nonlinear Factor Models with Unknown Monotone Links from Incomplete and Noisy Data
Chao, Yutong, Gökhan, Resat, Etesami, Jalal, Habibnia, Ali
We study a nonlinear factor model in which observed responses depend on low-rank latent factors through an unknown monotone link function. This setting is challenging and largely underexplored due to severe nonconvexity and identifiability issues. The link function is assumed to lie in a reproducing kernel Hilbert space (RKHS), enabling flexible nonparametric modeling while preserving identifiability. We formulate the problem as the joint recovery of the low-rank factors, loadings, and the nonlinear link function from possibly incomplete and noisy observations and propose a projected block coordinate descent (BCD) algorithm with explicit regularization to address scale and rotational ambiguities. Under mild incoherence of factors and standard sampling conditions, we establish convergence guarantees in both noiseless and noisy regimes, along with sublinear regret bounds for the link-function updates. Our results extend classical linear factor models to a broad nonlinear regime and provide a principled framework for learning nonlinear latent structures. We evaluate the proposed approach using controlled synthetic experiments, indicating promising performance.
Factor Augmented High-Dimensional SGD
Li, Shubo, Han, Yuefeng, Yu, Xiufan
Stochastic gradient descent (SGD) has been a cornerstone of machine learning since the pioneering work of Robbins & Monro (1951). Beyond its algorithmic simplicity and scalability, SGD has also become a central object of theoretical study, with refined analyses linking its dynamics to implicit regularization, generalization performance, and algorithmic stability. For decades, theoretical analyses of SGD have largely resided within the realm of classical stochastic approximation (Polyak & Juditsky, 1992; Lai, 2003; Bottou et al., 2018), where the data dimension is considered fixed while the sample size tends to infinity. While this regime has yielded foundational insights, it no longer fully reflects the characteristics of modern learning systems. Contemporary applications often operate in regimes where data dimension, sample size, and model complexity grow together, calling for new theoretical tools and perspectives that go beyond traditional asymptotic analyses. In this study, we focus on the learning tasks involving high-dimensional predictors. When SGD is applied directly to such data, the dimensionality of the feature space propagates into the optimization process, resulting in a highdimensional (HD) parameter space. Algorithmically, one trending strategy is to approximate the gradient updates using a low-rank representation to reduce memory costs and accelerate computation (Wang et al., 2018; Vogels et al., 2019; Kozak et al., 2019; Kasiviswanathan, 2021; Zhao et al., 2024). Theoretically, despite the vast literature on SGD, convergence guarantees of HD-SGD remain limited (Garrigos & Gower, 2023; Li et al., 2025).
Checklist
A.2: Comparison of the causal assumptions A.3: Comparison of allowed temporal covariates A.4: Unrelated works with similar terminology The SyncTwin algorithm. A.5: The generality of SyncTwin's assumed DGP A.6: Estimation for control and new individuals A.7: Algorithmic details and pseudocode A.8: Optimization for the matching loss Lm Simulation study.
Tucker Diffusion Model for High-dimensional Tensor Generation
Guo, Jianhua, Kong, Xinbing, Li, Zeyu, Mao, Junfan
Statistical inference on large-dimensional tensor data has been extensively studied in the literature and widely used in economics, biology, machine learning, and other fields, but how to generate a structured tensor with a target distribution is still a new problem. As profound AI generators, diffusion models have achieved remarkable success in learning complex distributions. However, their extension to generating multi-linear tensor-valued observations remains underexplored. In this work, we propose a novel Tucker diffusion model for learning high-dimensional tensor distributions. We show that the score function admits a structured decomposition under the low Tucker rank assumption, allowing it to be both accurately approximated and efficiently estimated using a carefully tailored tensor-shaped architecture named Tucker-Unet. Furthermore, the distribution of generated tensors, induced by the estimated score function, converges to the true data distribution at a rate depending on the maximum of tensor mode dimensions, thereby offering a clear theoretical advantage over the naive vectorized approach, which has a product dependence. Empirically, compared to existing approaches, the Tucker diffusion model demonstrates strong practical potential in synthetic and real-world tensor generation tasks, achieving comparable and sometimes even superior statistical performance with significantly reduced training and sampling costs.
Beyond identifiability: Learning causal representations with few environments and finite samples
Lee, Inbeom, Jin, Tongtong, Aragam, Bryon
We provide explicit, finite-sample guarantees for learning causal representations from data with a sublinear number of environments. Causal representation learning seeks to provide a rigourous foundation for the general representation learning problem by bridging causal models with latent factor models in order to learn interpretable representations with causal semantics. Despite a blossoming theory of identifiability in causal representation learning, estimation and finite-sample bounds are less well understood. We show that causal representations can be learned with only a logarithmic number of unknown, multi-node interventions, and that the intervention targets need not be carefully designed in advance. Through a careful perturbation analysis, we provide a new analysis of this problem that guarantees consistent recovery of (a) the latent causal graph, (b) the mixing matrix and representations, and (c) \emph{unknown} intervention targets.
Efficient Evaluation of LLM Performance with Statistical Guarantees
Wu, Skyler, Nair, Yash, Candès, Emmanuel J.
Exhaustively evaluating many large language models (LLMs) on a large suite of benchmarks is expensive. We cast benchmarking as finite-population inference and, under a fixed query budget, seek tight confidence intervals (CIs) for model accuracy with valid frequentist coverage. We propose Factorized Active Querying (FAQ), which (a) leverages historical information through a Bayesian factor model; (b) adaptively selects questions using a hybrid variance-reduction/active-learning sampling policy; and (c) maintains validity through Proactive Active Inference -- a finite-population extension of active inference (Zrnic & Candès, 2024) that enables direct question selection while preserving coverage. With negligible overhead cost, FAQ delivers up to $5\times$ effective sample size gains over strong baselines on two benchmark suites, across varying historical-data missingness levels: this means that it matches the CI width of uniform sampling while using up to $5\times$ fewer queries. We release our source code and our curated datasets to support reproducible evaluation and future research.